[DPE-7316] Stereo mode unified charm#1630
Draft
dragomirp wants to merge 102 commits intostereo-mode-additive-codefrom
Draft
[DPE-7316] Stereo mode unified charm#1630dragomirp wants to merge 102 commits intostereo-mode-additive-codefrom
dragomirp wants to merge 102 commits intostereo-mode-additive-codefrom
Conversation
Add a lightweight witness/voter charm that participates in Raft consensus to provide quorum in 2-node PostgreSQL clusters without storing any PostgreSQL data. Key components: - Watcher charm with Raft controller integration - Health checking for PostgreSQL endpoints - Relation interface (postgresql_watcher) for PostgreSQL operator - Topology and health check actions Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
… pysyncobj Raft service Add standalone raft_service.py that implements KVStoreTTL-compatible Raft node managed as a systemd service, eliminating the dependency on the charmed-postgresql snap. Remove automatic health checks in favor of on-demand checks via action, since the watcher lacks PostgreSQL credentials. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…tereo mode tests Replace cut_network_from_unit_without_ip_change with cut_network_from_unit in stereo mode integration tests. The iptables-based approach with REJECT was still causing timeouts; removing the interface entirely triggers faster TCP connection failures. Added use_ip_from_inside=True for check_writes since restored units get new IPs. Also adds spread task for stereo mode tests. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Add Raft member proactively during IP change to prevent race conditions where member restarts Patroni before being added to cluster. Implement watcher removal from Raft on relation departure to maintain correct quorum calculations. Add idempotency check before adding watcher to Raft. Use fresh peer IPs for Raft member addition instead of cached values. Update stereo mode tests with iptables-based network isolation and Raft health verification. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…o tests Build the watcher charm automatically if not found and deploy charms sequentially instead of concurrently to improve reliability. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
- Add idempotency check to skip deployment if already in expected state - Clean up unexpected state before redeploying to avoid test pollution - Add wait_for_idle after replica shutdown to allow cluster stabilization Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…fy_raft_cluster_health call - Add use_ip_from_inside=True to test_watcher_network_isolation to handle stale IPs - Fix verify_raft_cluster_health call in test_health_check_action to pass required arguments Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Add __expire_keys and _onTick methods to WatcherKVStoreTTL to match Patroni's KVStoreTTL behavior. When the watcher becomes the Raft leader (e.g., when PostgreSQL primary is network-isolated), it must expire stale leader keys so that a replica can acquire leadership. Without this fix, the watcher would become Raft leader but wouldn't process TTL expirations, causing the old Patroni leader key to remain valid and preventing failover. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Juju action results require hyphenated keys (e.g., 'healthy-count') rather than underscored keys. Fixed the health check action to use proper key format and updated test expectations. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…sues
- Add watcher PostgreSQL user for health check authentication:
- Create 'watcher' user with password via relation secret
- Add pg_hba.conf entry for watcher IP in patroni.yml template
- Pass password from relation secret to health checker
- Fix lint issues:
- Extract S3 initialization to _handle_s3_initialization() to reduce
_on_peer_relation_changed complexity from 11 to 10
- Use absolute paths for subprocess commands (/usr/bin/systemctl, etc.)
- Update type hints to use modern syntax (X | None vs Optional[X])
- Fix line length formatting issues
- Fix unit test failures:
- Add missing mocks in test_update_member_ip for endpoint methods
- Add _units_ips mock in test_update_relation_data_leader
- Fix integration test:
- Add check_watcher_ip parameter to verify_raft_cluster_health()
to handle watcher IP changes after network isolation tests
- Update watcher charm to handle IP changes:
- Add _update_unit_address_if_changed() for IP change detection
- Call from config-changed and update-status events
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Remove outdated constraint about deploy order being critical for stereo mode with Raft DCS. Testing confirmed that 2 PostgreSQL units can now be deployed simultaneously without causing split-brain. Also update deprecated relate() calls to integrate(). Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
* add new-tab-link extension and increase linkcheck timeout Signed-off-by: andreia <andreia.velasco@canonical.com> * replace mentions of old Juju password actions with Juju secrets Signed-off-by: andreia <andreia.velasco@canonical.com> * update links to 16 repo and remove mention of 14 bundle Signed-off-by: andreia <andreia.velasco@canonical.com> * update instructions for secrets retrieval --------- Signed-off-by: andreia <andreia.velasco@canonical.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* refactor home page * fix missing refs
* add new stable releases to releases.md * invert order (newest to oldest) * Update release in refresh docs * correct architecture for 990, 989 * correct arch for 952, 951 --------- Co-authored-by: Carl Csaposs <carl.csaposs@canonical.com>
Integrate the watcher charm as a mode within the main PostgreSQL charm, following the MongoDB pattern of using a config `role` option to alternate between "postgresql" (default) and "watcher" modes. Key changes: - Add `role` config option (postgresql|watcher), immutable after deploy - Rename provides relation `watcher` to `watcher-offer` for PostgreSQL mode - Add requires relation `watcher` for watcher mode - Branch charm __init__ based on role: watcher mode skips snap install, Patroni, backups, TLS, etc. and only runs Raft + health checker - Move watcher source files (raft_controller, raft_service, watcher_health) into main src/ - Create WatcherRequirerHandler for watcher-mode event handling - Persist role in peer databag and block on role change attempts - Update integration tests for unified charm deployment Deploy example: juju deploy postgresql pg juju deploy postgresql pg-watcher --config role=watcher juju relate pg:watcher-offer pg-watcher:watcher Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The @trace_charm decorator expects tracing_endpoint attribute to exist after __init__. In watcher mode we return early, so set it to None. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
* Limit bucket listing to find the timelines * Add ceph pitr test * Switch back to recurse * Refactor tests * Fix imports * Fix tests * Reduce boto logs * Typo
* Cleanup config code * Merge update sync config in the bulk patch call * Add storage-hot-standby-feedback and durability-maximum-lag-on-failover * Fix default * Remove extra patch * Update to spec
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Move TLS transfer to single kernel * Switch to released lib
* add instructions for custom usernames to integration guide * Update docs/how-to/integrate-with-another-application.md Co-authored-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com> Signed-off-by: Andreia <andreia.velasco@canonical.com> --------- Signed-off-by: Andreia <andreia.velasco@canonical.com> Co-authored-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…ailable) (#1318) * DPE-8980 Support Juju 4: us 'ip' databag field (overwrites 'private-address') The Juju 4 has removed support databag fiesl `private-address`, `ingress-address` and more. The field we should use is `ip` now. The PG16 charm still have to support Juju 3.6 LTS, so adding support of the ip field with backward compatibility. Users can deploy it on Juju 4 using: > juju deploy postgresql --channel 16/edge --force * Address comments in PR
| content = secret.get_content(refresh=True) | ||
| return content.get("raft-password") | ||
| except SecretNotFoundError: | ||
| logger.warning(f"Secret {secret_id} not found") |
4a5a1af to
56bc4ce
Compare
0abd2aa to
439e345
Compare
1678751 to
80d1906
Compare
5ab2b49 to
b4f5c14
Compare
879bede to
9aaafc7
Compare
| content = secret.get_content(refresh=True) | ||
| return content.get("watcher-password") | ||
| except SecretNotFoundError: | ||
| logger.warning(f"Secret {secret_id} not found") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Solution
Checklist